In this paper, we attempt to examine whether the use of change-point analysis techniques is appropriate for detecting fatigue based on data captured from wearable sensors. As such, we perform a secondary data analysis to the data generated in: Baghdadi et al., 2018. The reader should note that their raw data was preprocessed using:

  1. Kalman Filter: Used to process the raw data from sensors to: (i) estimate the spatial orientation of the body with respect to the global reference frame, and (ii) to estimate the kinematics of motion.

  2. Segmentation: The motion segments where then segmented using an algorithm that assumes the existence of two peaks in the translational acceleration of the gait cycle. This assumption was justified based on the results of Tongen and Wunderlich, 2010 as well as the authors’ preliminary analyses.

The reader can show any code chunk by clicking on the code button. We chose to make the default for the code hidden since we: (a) wanted to improve the readability of this document; and (b) assumed that the readers will not be interested in reading every code chunk.


1 Loading Data & Generating Features

The snippet below documents the list of R libraries that were used in this research. For convenience, we used the pacman package since it allows for installing/loading the needed libraries in one step.

rm(list = ls()) # clear global environment
graphics.off() # close all graphics
library(pacman) # needs to be installed first
p_load(R.matlab, plotly, extrafont, grDevices, gridExtra,
       dplyr, stringr, tidyverse, utils, reshape2,
       anomalize, forecast, MVN, fractal,
       ecp, dfphase1, 
       MALDIquant, TSclust,
       knitr, kableExtra)  

In the snippet below, we extract the 15 “.mat” files in the GitHub repository (where we loaded the data to allow for the reproduction of our work). Note that these files were originally produced in Baghdadi et al., 2018. Then, we perform several transformation steps: (a) extracting the data for the first three columns in the matlab arrays; and (b) computing three kinematic features from the data corresponding to these columns. Due to the differences between Matlab and R, this requires two nested for loops. The outer loop increments over the number of subjects, while the inner loop increments based on the different number of rows of data for each subject. Please see the comments within the code chunk for more details.

num_subjects <- seq(1, 15)
subject_heights <- c(1.71, 1.77, 1.71, 1.59, 1.69,
                     1.63, 1.60, 1.71, 1.67, 1.78,
                     1.68, 1.55, 1.83, 1.81, 1.89)

# Initilizing a df for summary data on participants
summary_df <- data.frame(matrix(nrow = 15, ncol = 9))
colnames(summary_df) <- c("Subject.Num", "num.rows",
                          "num.cols", "mean.scaled.stride.len",
                          "sd.scaled.stride.len",
                          "mean.scaled.stride.height",
                          "sd.scaled.stride.height",
                          "mean.stride.duration",
                          "sd.stride.duartion")

for (i in 1:length(num_subjects)) {
  # Reading the .mat files from GitHub
  raw_data <- readMat(paste0("https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/Raw/Subject",num_subjects[i],".mat?raw=true"))
  # Compute the number of cells, and rows in each structered matrix
  raw_data_size <- lengths(raw_data) # num of cells
  num_rows <- raw_data_size / 17 # all data had 17 cols
  # Initilizing the six lists needed for storing the data (we keep track of the top 3 for error checking)
  time_in_sec <- vector("list", length = num_rows)
  position_x <- vector("list", length = num_rows)
  position_y <- vector("list", length = num_rows)
  stride_time <- vector("list", length = num_rows)
  stride_length <- vector("list", length = num_rows)
  stride_height <- vector("list", length = num_rows)
  stride_duration <- vector("list", length = num_rows)
  
  # Following for loop is needed since R reads the structured array as a nested list. The list containing the data is called "M.i.k" and it transforms/reads the original array --> rowise. This means that our first three features (with the same timestamp) are always seperated with a distance equal to the total number of rows
  for (j in 1:num_rows) {
    position_x[[j]] <- raw_data[["M.i.k"]][[j]]
    position_y[[j]] <- raw_data[["M.i.k"]][[num_rows + j]]
    stride_time[[j]] <- raw_data[["M.i.k"]][[2 * num_rows + j]]
    dataholder <- raw_data[["M.i.k"]][[16 * num_rows + j]] # data holder for time
    # Computing the three needed kinematic features
    stride_length[[j]] <-
      range(position_x[[j]])[2] - range(position_x[[j]])[1]
    stride_height[[j]] <-
      range(position_y[[j]])[2] - range(position_y[[j]])[1]
    stride_duration[[j]] <-
      range(stride_time[[j]])[2] - range(stride_time[[j]])[1]
    time_in_sec[[j]] <- lapply(dataholder, mean)# using mean time of stride as a time stamp
  }
  
  # Scaling and creating one data frame per subject
  assign(paste0("subject", i, "_features"), 
         data.frame(time.from.start = unlist(time_in_sec), 
                    scaled.stride.len = unlist(stride_length)/subject_heights[i], 
                    scaled.stride.height = unlist(stride_height) / subject_heights[i], 
                    stride.duration = unlist(stride_duration)
                    )
         )
  
  # Creating a Summary Data Frame
  df_name <- paste0("subject", i, "_features")
  summary_df[i, 1] <- paste0("subject", i)
  summary_df[i, 2] <- get(df_name) %>% nrow()
  summary_df[i, 3] <- get(df_name) %>% ncol()
  summary_df[i, 4] <- get(df_name)[, 2] %>% mean() %>% round(digits = 4)
  summary_df[i, 5] <- get(df_name)[, 2] %>% sd() %>% round(digits = 4)
  summary_df[i, 6] <- get(df_name)[, 3] %>% mean() %>% round(digits = 4)
  summary_df[i, 7] <- get(df_name)[, 3] %>% sd() %>% round(digits = 4)
  summary_df[i, 8] <- get(df_name)[, 4] %>% mean() %>% round(digits = 4)
  summary_df[i, 9] <- get(df_name)[, 4] %>% sd() %>% round(digits = 4)
}
# Printing the top six rows of Subject 4's data as an example
head(subject4_features) %>% round(digits = 3)
# A Summary of the features for all 15 participants
summary_df
rm(raw_data, raw_data_size, i, j, num_rows, 
   dataholder, num_subjects)
save.image(file = "./Data/RGenerated/FeatureGeneration.RData")

Based on the analysis above, there are three observations to be made. First, we scaled the stride length and height based on the subject’s height. This in essence allows us to capture the steps as a percentage of the person’s height. This reduces the between subject variablity in the data and is supported by the seminal work of Oberg et al., 1993. Second, we showed the first six rows from Subject 4 to provide the reader with some insights into the sampling frequeny (after the preprocessing done in Baghdadi et al. 2018). Note that the kinematic features are computed from the sensor channels provided in their paper. Third, we saved the generated data into an R.data file, which can be accessed by clicking on: FeatureGeneration.RData. We hope that this saved data allows other researchers to reproduce and/or build on our work.


2 Detecting and Removing Outliers

In this task, we examined several approaches for outlier detection. We originally hypothesized that these outliers constitue faulty sensor readings (since they are not sustained), and thus, we can remove this data prior to any further analysis. However, based on a detailed investigation of the data and the application of several outlier detection methods (univariate and multivariate), we have learned that the data is non-normal, highly autocorrelated and non-stationary. Thus, these approaches are not suitable for removing the outliers from our data. Thus, we then investiaged the utilization of “filtering” based methods in Section 3 to impute/correct/smoothen the data.

For the sake of reproducibility and completeness, we provide the results from our implementation of several outlier detection methods in the subsections below. Note that these methods will not be incorporated in our final analysis. They are presented here only to document how we reached our final model/approach.

2.1 Plotting the Data

First, we use a standard line plot to depict the data for each feature by person. For the sake of facilitating the visualization process, we: (a) panel the plot such that the left, center and right panels correspond to the scaled stride length, scaled stride height and step duration, respectively; and (b) we save the results of each participant (P) in a different tab.

# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
  df_transformed <- melt(get(paste0("subject",i,"_features")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         )
  assign(paste0("p",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start,y=value,group=variable,color=variable,
                    shape=variable)) + 
           geom_line() + theme_bw() + 
           #ylim(0,2) +
           ggtitle(paste0("Participant ",i,": Pre-filtered Data")) + 
           theme(legend.position="none") + 
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  cat('###',paste0("P", i), "{-}",'\n')
  print(get(paste0("p",i)))
  cat('\n \n')
  
  stat_printdf <- get(paste0("subject",i,"_features"))
  
 cat(paste0('<source> <p> Based on the <b>raw data</b>, Participant ', i,
             ' has an average scaled stride length of ',
             round(mean(stat_printdf[,2]),2),
             ', an average scaled stride height of ',
             round(mean(stat_printdf[,3]),2),
             ' and an average stride duration of ', round(mean(stat_printdf[,4]),2),
             ' seconds. Based on their height of ', subject_heights[i],
             ' meters, their average stride length and stride height are', ' ',
             round(mean(stat_printdf[,2])*subject_heights[i],2),
             ' and ', ' ', round(mean(stat_printdf[,3])*subject_heights[i],2),
             ', respectively.', '</p> </source>'
             )
      )
cat(' \n \n')
}

P1

Based on the raw data, Participant 1 has an average scaled stride length of 1.07, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.82 and 0.16, respectively.

P2

Based on the raw data, Participant 2 has an average scaled stride length of 1.63, an average scaled stride height of 0.13 and an average stride duration of 0.95 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.88 and 0.23, respectively.

P3

Based on the raw data, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.16, respectively.

P4

Based on the raw data, Participant 4 has an average scaled stride length of 1.28, an average scaled stride height of 0.12 and an average stride duration of 0.95 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.03 and 0.2, respectively.

P5

Based on the raw data, Participant 5 has an average scaled stride length of 0.96, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.63 and 0.15, respectively.

P6

Based on the raw data, Participant 6 has an average scaled stride length of 0.91, an average scaled stride height of 0.11 and an average stride duration of 1.06 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.48 and 0.18, respectively.

P7

Based on the raw data, Participant 7 has an average scaled stride length of 0.84, an average scaled stride height of 0.09 and an average stride duration of 1.12 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.34 and 0.15, respectively.

P8

Based on the raw data, Participant 8 has an average scaled stride length of 1.53, an average scaled stride height of 0.14 and an average stride duration of 0.98 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.62 and 0.24, respectively.

P9

Based on the raw data, Participant 9 has an average scaled stride length of 0.87, an average scaled stride height of 0.1 and an average stride duration of 1.11 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.46 and 0.17, respectively.

P10

Based on the raw data, Participant 10 has an average scaled stride length of 1.28, an average scaled stride height of 0.11 and an average stride duration of 1.01 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.27 and 0.2, respectively.

P11

Based on the raw data, Participant 11 has an average scaled stride length of 1.2, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.02 and 0.15, respectively.

P12

Based on the raw data, Participant 12 has an average scaled stride length of 1.61, an average scaled stride height of 0.14 and an average stride duration of 0.89 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.5 and 0.21, respectively.

P13

Based on the raw data, Participant 13 has an average scaled stride length of 0.67, an average scaled stride height of 0.07 and an average stride duration of 1.14 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.23 and 0.13, respectively.

P14

Based on the raw data, Participant 14 has an average scaled stride length of 1.32, an average scaled stride height of 0.14 and an average stride duration of 0.93 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.39 and 0.25, respectively.

P15

Based on the raw data, Participant 15 has an average scaled stride length of 1.18, an average scaled stride height of 0.1 and an average stride duration of 1.04 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.24 and 0.2, respectively.

2.2 Implementation of Several Outlier Detection Methods

2.2.1 Boxplots for Outlier Detection

One of the most commonly used approaches for univariate outliers are defined to be the observations that lie outside \(\small 1.5 * IQR\), where the IQR is the inter quartile range. This can be easily implemented using in base R using the boxplot.stats function. Below, we first provide the output and data plots for each person.

# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")

outliers_bp <- list() # initilizing a list to store outliers per participant
for (i in 1:15) {
  df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
  # Obtaining the Outliers for each of the three variables
  out_vals_sl<- boxplot.stats(df[,2])$out
  out_rows_sl <- which(df[,2] %in% out_vals_sl)
  out_vals_sh<- boxplot.stats(df[,3])$out
  out_rows_sh <- which(df[,3] %in% out_vals_sh)
  out_vals_sd<- boxplot.stats(df[,4])$out
  out_rows_sd <- which(df[,4] %in% out_vals_sd)
  # Generating a union set of all obs. that have outliers
  # True: if any of the 3 vars for that obs. is an outlier
  outliers_total <- unique(c(out_rows_sl, out_rows_sh,
                             out_rows_sd))
  outliers_bp[[i]] <- outliers_total # saving it to list indexed by participant number
  
  # Remove the observations corresponding to the outliers
  assign(paste0("subject",i,"_bp"), 
         df[-outliers_total,])  
  
  # Preparing the data for the Line Graph
  df_transformed <- melt(get(paste0("subject",i,"_bp")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
  
  assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           #ylim(0,2) +
           ggtitle("Outliers removed via the standard boxplot method \n (any point outside of 1.5*IQR was removed)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = 'free_y')
         )
  cat('####',paste0("P", i), "{-}",'\n')
  
    
    # Printing the % of outliers removed
    cat("\n")
    num_rows <- nrow(get(paste0("subject",i,"_features")))
    num_outliers <- length(outliers_total)
    percent_data <- round(100*num_outliers/num_rows,2)
    cat("<b>",paste0("% of removed Observations for Partcipant ",i,
                 ": ", percent_data,"%."), "</b>",
        "This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.")
    cat('\n \n')
  # Plotting the data without the outliers
  print(get(paste0("g",i)))
  cat('\n \n')
  
  stat_printdf <- get(paste0("subject",i,"_bp"))
  
 cat(paste0('<source> <p> After applying the <b>boxplot outlier detection methodology</b>, Participant ', i,
             ' has an average scaled stride length of ',
             round(mean(stat_printdf[,2]),2),
             ', an average scaled stride height of ',
             round(mean(stat_printdf[,3]),2),
             ' and an average stride duration of ', round(mean(stat_printdf[,4]),2),
             ' seconds. Based on their height of ', subject_heights[i],
             ' meters, their average stride length and stride height are', ' ',
             round(mean(stat_printdf[,2])*subject_heights[i],2),
             ' and ', ' ', round(mean(stat_printdf[,3])*subject_heights[i],2),
             ', respectively.', '</p> </source>'
             )
      )
  
cat('\n')
  

cat('<source> <p> Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the <i>tsoutlier() function</i> in the <a href="https://cran.r-project.org/web/packages/forecast/forecast.pdf">forecast package</a> and the <i> iqr() function </i> in the <a href="https://cran.r-project.org/web/packages/anomalize/anomalize.pdf">anomalize package</a> use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify. </p> </source>')

cat(' \n \n')
}

P1

% of removed Observations for Partcipant 1: 2.13%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 1 has an average scaled stride length of 1.07, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.83 and 0.16, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P2

% of removed Observations for Partcipant 2: 0.51%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 2 has an average scaled stride length of 1.63, an average scaled stride height of 0.13 and an average stride duration of 0.95 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.89 and 0.23, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P3

% of removed Observations for Partcipant 3: 6.1%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.15, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P4

% of removed Observations for Partcipant 4: 8.93%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 4 has an average scaled stride length of 1.32, an average scaled stride height of 0.12 and an average stride duration of 0.94 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.1 and 0.2, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P5

% of removed Observations for Partcipant 5: 7.61%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 5 has an average scaled stride length of 0.99, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.66 and 0.15, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P6

% of removed Observations for Partcipant 6: 5.01%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 6 has an average scaled stride length of 0.91, an average scaled stride height of 0.11 and an average stride duration of 1.06 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.49 and 0.18, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P7

% of removed Observations for Partcipant 7: 10.85%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 7 has an average scaled stride length of 0.85, an average scaled stride height of 0.09 and an average stride duration of 1.14 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.36 and 0.15, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P8

% of removed Observations for Partcipant 8: 11.2%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 8 has an average scaled stride length of 1.6, an average scaled stride height of 0.14 and an average stride duration of 0.96 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.74 and 0.24, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P9

% of removed Observations for Partcipant 9: 8.76%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 9 has an average scaled stride length of 0.86, an average scaled stride height of 0.1 and an average stride duration of 1.12 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.44 and 0.17, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P10

% of removed Observations for Partcipant 10: 7.32%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 10 has an average scaled stride length of 1.33, an average scaled stride height of 0.11 and an average stride duration of 1 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.36 and 0.19, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P11

% of removed Observations for Partcipant 11: 8.81%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 11 has an average scaled stride length of 1.22, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.06 and 0.15, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P12

% of removed Observations for Partcipant 12: 10.86%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 12 has an average scaled stride length of 1.68, an average scaled stride height of 0.14 and an average stride duration of 0.88 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.61 and 0.21, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P13

% of removed Observations for Partcipant 13: 7.74%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 13 has an average scaled stride length of 0.68, an average scaled stride height of 0.07 and an average stride duration of 1.15 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.25 and 0.13, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P14

% of removed Observations for Partcipant 14: 10.93%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 14 has an average scaled stride length of 1.39, an average scaled stride height of 0.14 and an average stride duration of 0.91 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.51 and 0.25, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

P15

% of removed Observations for Partcipant 15: 4.98%. This removes any value outside 1.5 * IQR for any of the 3 features. From the code, it should be clear that we are implementing this columnwise, starting with stride length and ending with stride duration.

After applying the boxplot outlier detection methodology, Participant 15 has an average scaled stride length of 1.2, an average scaled stride height of 0.1 and an average stride duration of 1.04 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.27 and 0.19, respectively.

Note that we could have tried making the value of the coefficient multiplied by the IQR coefficient larger (e.g., both the tsoutlier() function in the forecast package and the iqr() function in the anomalize package use a value of 3 for the coefficient); however, we did not explore this option since this: (a) would have possibly required a large amount of trial-and-error, and (b) the approach would be difficult to justify.

# Saving a list of cleaned 
save(subject1_bp, subject2_bp, subject3_bp, subject4_bp,
     subject5_bp, subject6_bp, subject7_bp, subject8_bp,
     subject9_bp, subject10_bp, subject11_bp, subject12_bp,
     subject13_bp, subject14_bp, subject15_bp, outliers_bp,
     file = "./Data/RGenerated/OutliersRemovedBoxplot.RData")

2.2.2 Mahalanobis Dist. based Method

# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
mal_results <- {}
for (i in 1:15) {
  df <- get(paste0("subject",i,"_features"))
  df <- df[,2:4]#only taking the four numeric fields
  cat('####',paste0("P", i), "{-}",' \n')
  cat('The output from the <source> <i> MVN package </i> </source> is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. </source> <b> Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. </b> </source>')

  plot.new()
  mvn_test <- mvn(df, mvnTest = "dh",desc = TRUE,
                univariateTest = "Lillie",
                multivariateOutlierMethod = "adj",
                showOutliers = TRUE) #adjusted Mahl Distance 
 cat(' \n')
  
  cat(paste0('In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant ',i,' below. \n \n'))
 mvn_Normality <- mvn_test$multivariateNormality
 rownames(mvn_Normality) <- paste('Participant',i)
 tab <- xtable::xtable(mvn_test$multivariateNormality,
                       align = c("c","c","c","c","c","c"))
 print(tab, type= 'html',
       html.table.attributes = 'align="center", rules="rows", 
                                width=50%, frame="below"')
  cat('<source> <p> \n </p> </source>')
  
  cat(paste0('Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant ', i, ' but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.'))
  
  cat(' \n \n')
  
}

P1

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 1 below.

Test E df p value MVN
1 Doornik-Hansen 14677.13 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 1 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P2

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 2 below.

Test E df p value MVN
1 Doornik-Hansen 7021.46 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 2 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P3

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 3 below.

Test E df p value MVN
1 Doornik-Hansen 2778.83 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 3 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P4

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 4 below.

Test E df p value MVN
1 Doornik-Hansen 10446.56 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 4 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P5

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 5 below.

Test E df p value MVN
1 Doornik-Hansen 6516.14 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 5 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P6

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 6 below.

Test E df p value MVN
1 Doornik-Hansen 15389.84 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 6 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P7

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 7 below.

Test E df p value MVN
1 Doornik-Hansen 3109.05 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 7 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P8

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 8 below.

Test E df p value MVN
1 Doornik-Hansen 6399.80 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 8 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P9

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 9 below.

Test E df p value MVN
1 Doornik-Hansen 5815.28 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 9 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P10

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 10 below.

Test E df p value MVN
1 Doornik-Hansen 14166.64 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 10 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P11

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 11 below.

Test E df p value MVN
1 Doornik-Hansen 15132.69 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 11 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P12

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 12 below.

Test E df p value MVN
1 Doornik-Hansen 8902.87 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 12 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P13

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 13 below.

Test E df p value MVN
1 Doornik-Hansen 6561.33 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 13 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P14

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 14 below.

Test E df p value MVN
1 Doornik-Hansen 7265.08 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 14 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

P15

The output from the MVN package is shown below. Note that we depict the multivariate normal distribution for the three features of each distribution below. Essentially, the goal here is two-fold. First, we would like to examine if the joint distribution of the three kinematic features can be modeled using the multivariate normal distribution. Second, if the first hypothesis is true, then we can use the Mahalanobis Distance to identify outliers in each observation. In addition to the plot, we provide the statistics of the Doornik-Hansen test for Participant 15 below.

Test E df p value MVN
1 Doornik-Hansen 22184.55 6.00 0.00 NO

Based on the MVN test and the plot above, it is clear that the data is not normally distributed. Thus, an outlier detection/removal algorithm based on a MVN distribution will be inappropriate. This observation is true to not only Participant 15 but to also all the other participants. Thus, in the following section we will investigate method(s) based on the signal processing literature detection literature.

3 Filtering Approaches and Data Preprocessing Approaches

An alternative approach to detecting and then removing outliers is to use smoothing techniques to “correct” the data. Based on the results shown in the previous sections, the fundamental hypotheses behind this approach are:

  1. The sensors’ data seem to be non-stationary, autocorrelated and somewhat complex (from a statistical perspective). More importantly, there are spikes in the data that cannot be explained by the kinematic model. These spikes produce values that are unrealistic (e.g., a scaled stride length > 2 is not possible since that means that the stride of the person is two times larger than their height). While this insight is useful, there is no hard limit that we can enforce since the literature reports mean values across the population instead of attempting to provide a physical threshold on what is possible.

  2. The outliers tended to appear in pairs of two; indicating a possible issue in the segmentation of the data. Note that we label this data as outliers since it is not sustained (i.e., it is unlikely to say that a person is fatigued for only two stride).

  3. Smoothing (i.e. a low pass filter on the data) will allow us to impute values for every stride (after an initial size window). This will allow us to preserve the successive nature of the strides, “fix” the outliers based on neighboring strides, and not throw away any observation.

3.1 The Median Filtering Approach from the Signal Processing Literature

An algorithmic approach based on these observations is implemented below. Note that we are using the median filter from the fractal package on CRAN. Note that in the analysis below, we are applying the median Filter, with n=29 (i.e., the median is calculated based on a overlaping moving window of size 29 strides). We chose this window size based on the recommendation of our biomechanics expert as it allows us to: (a) ignore the effect of any turns in the direction of motion; and (b) the corresponding window size is approximately 30 seconds long (which presents an easy to remember rule for practical application).

# We are reading the data locally (you can download the file from GitHub)
load(file = "./Data/RGenerated/FeatureGeneration.RData")
for (i in 1:15) {
  df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
  # Obtaining the Outliers for each of the three variables
  time.from.start <- df[,1]
  scaled.stride.len <- medianFilter(df[,2], order = 29) # moving window of size 29 obs
  scaled.stride.height <- medianFilter(df[,3], order = 29) # moving window of size 29 obs
  stride.duration <- medianFilter(df[,4], order = 29) # moving window of size 29 obs
 
  # Computing the first difference to make seein the values easier
  diff.time.from.start <- time.from.start[-1]
  diff.scaled.stride.len <- diff(scaled.stride.len)
  diff.scaled.stride.height <- diff(scaled.stride.height)
  diff.stride.duration <- diff(stride.duration)
  # Remove the observations corresponding to the outliers
  assign(paste0("subject",i,"_medianf"), 
         data.frame(cbind(time.from.start, scaled.stride.len, 
                          scaled.stride.height, stride.duration)))
  
  assign(paste0("subject",i,"_median_diff_f"), 
         data.frame(cbind( diff.time.from.start, diff.scaled.stride.len, 
                          diff.scaled.stride.height, diff.stride.duration)))
  
  # Preparing the data for the Line Graph
  df_transformed <- melt(get(paste0("subject",i,"_medianf")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
  
  diff_df_transformed  <- melt(get(paste0("subject",i,"_median_diff_f")),
                         id.vars = "diff.time.from.start",
                         measure.vars=c("diff.scaled.stride.len",
                                        "diff.scaled.stride.height",
                                        "diff.stride.duration"
                                        )
                         ) # ggplot data needs to be tall

  assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Data post the application of the median filter") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  
  assign(paste0("h",i),
         ggplot(data = diff_df_transformed,
                aes(x=diff.time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("First Difference of the Signal post the median filter") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  
  cat('###',paste0("P", i), "{-}",'\n')
  
  print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>median filter</b>, Participant ', i,
             ' has an average scaled stride length of ',
             round(mean(scaled.stride.len),2),
             ', an average scaled stride height of ',
             round(mean(scaled.stride.height),2),
             ' and an average stride duration of ', round(mean(stride.duration),2),
             ' seconds. Based on their height of ', subject_heights[i],
             ' meters, their average stride length and stride height are', ' ',
             round(mean(scaled.stride.len)*subject_heights[i],2),
             ' and ', ' ', round(mean(scaled.stride.height)*subject_heights[i],2),
             ', respectively.', '</p> </source>'
             )
      )
  
  cat('\n')
  
  print(get(paste0("h",i)))
  
  cat(paste0('<source> <p> Based on the <b>median filter </b>the interquarile range of the <b>first difference </b> of Participant ', i,
             '\'s scaled stride length, scaled stride height and stride duration are',
             ' ',
             round(IQR(diff.scaled.stride.len),3),
             ', ', round(IQR(diff.scaled.stride.height),3),
             ', and ', round(IQR(diff.stride.duration),3),
             ', respectively. </p> </source>'
             )
      )
    
  cat(' \n \n')
  cat('<source> <p> The data processed with the median filtered approach is stored at the <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/MedianFilteredData.RData">following location</a> in our GitHub Repository.')
  cat('\n \n')
  
}

P1

Based on the median filter, Participant 1 has an average scaled stride length of 1.08, an average scaled stride height of 0.09 and an average stride duration of 1.06 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.84 and 0.16, respectively.

Based on the median filter the interquarile range of the first difference of Participant 1’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P2

Based on the median filter, Participant 2 has an average scaled stride length of 1.65, an average scaled stride height of 0.13 and an average stride duration of 0.94 seconds. Based on their height of 1.77 meters, their average stride length and stride height are 2.91 and 0.24, respectively.

Based on the median filter the interquarile range of the first difference of Participant 2’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P3

Based on the median filter, Participant 3 has an average scaled stride length of 0.94, an average scaled stride height of 0.09 and an average stride duration of 1.08 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 1.61 and 0.15, respectively.

Based on the median filter the interquarile range of the first difference of Participant 3’s scaled stride length, scaled stride height and stride duration are 0.003, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P4

Based on the median filter, Participant 4 has an average scaled stride length of 1.31, an average scaled stride height of 0.13 and an average stride duration of 0.93 seconds. Based on their height of 1.59 meters, their average stride length and stride height are 2.09 and 0.2, respectively.

Based on the median filter the interquarile range of the first difference of Participant 4’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P5

Based on the median filter, Participant 5 has an average scaled stride length of 0.98, an average scaled stride height of 0.09 and an average stride duration of 1.09 seconds. Based on their height of 1.69 meters, their average stride length and stride height are 1.66 and 0.15, respectively.

Based on the median filter the interquarile range of the first difference of Participant 5’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P6

Based on the median filter, Participant 6 has an average scaled stride length of 0.92, an average scaled stride height of 0.11 and an average stride duration of 1.05 seconds. Based on their height of 1.63 meters, their average stride length and stride height are 1.5 and 0.17, respectively.

Based on the median filter the interquarile range of the first difference of Participant 6’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P7

Based on the median filter, Participant 7 has an average scaled stride length of 0.85, an average scaled stride height of 0.09 and an average stride duration of 1.13 seconds. Based on their height of 1.6 meters, their average stride length and stride height are 1.36 and 0.15, respectively.

Based on the median filter the interquarile range of the first difference of Participant 7’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P8

Based on the median filter, Participant 8 has an average scaled stride length of 1.59, an average scaled stride height of 0.14 and an average stride duration of 0.96 seconds. Based on their height of 1.71 meters, their average stride length and stride height are 2.72 and 0.24, respectively.

Based on the median filter the interquarile range of the first difference of Participant 8’s scaled stride length, scaled stride height and stride duration are 0.004, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P9

Based on the median filter, Participant 9 has an average scaled stride length of 0.87, an average scaled stride height of 0.1 and an average stride duration of 1.11 seconds. Based on their height of 1.67 meters, their average stride length and stride height are 1.46 and 0.17, respectively.

Based on the median filter the interquarile range of the first difference of Participant 9’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P10

Based on the median filter, Participant 10 has an average scaled stride length of 1.34, an average scaled stride height of 0.11 and an average stride duration of 0.99 seconds. Based on their height of 1.78 meters, their average stride length and stride height are 2.38 and 0.2, respectively.

Based on the median filter the interquarile range of the first difference of Participant 10’s scaled stride length, scaled stride height and stride duration are 0.003, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P11

Based on the median filter, Participant 11 has an average scaled stride length of 1.22, an average scaled stride height of 0.09 and an average stride duration of 1.05 seconds. Based on their height of 1.68 meters, their average stride length and stride height are 2.04 and 0.15, respectively.

Based on the median filter the interquarile range of the first difference of Participant 11’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P12

Based on the median filter, Participant 12 has an average scaled stride length of 1.64, an average scaled stride height of 0.14 and an average stride duration of 0.88 seconds. Based on their height of 1.55 meters, their average stride length and stride height are 2.55 and 0.22, respectively.

Based on the median filter the interquarile range of the first difference of Participant 12’s scaled stride length, scaled stride height and stride duration are 0.001, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P13

Based on the median filter, Participant 13 has an average scaled stride length of 0.68, an average scaled stride height of 0.07 and an average stride duration of 1.15 seconds. Based on their height of 1.83 meters, their average stride length and stride height are 1.24 and 0.13, respectively.

Based on the median filter the interquarile range of the first difference of Participant 13’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P14

Based on the median filter, Participant 14 has an average scaled stride length of 1.37, an average scaled stride height of 0.14 and an average stride duration of 0.92 seconds. Based on their height of 1.81 meters, their average stride length and stride height are 2.48 and 0.25, respectively.

Based on the median filter the interquarile range of the first difference of Participant 14’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

P15

Based on the median filter, Participant 15 has an average scaled stride length of 1.22, an average scaled stride height of 0.1 and an average stride duration of 1.03 seconds. Based on their height of 1.89 meters, their average stride length and stride height are 2.3 and 0.19, respectively.

Based on the median filter the interquarile range of the first difference of Participant 15’s scaled stride length, scaled stride height and stride duration are 0.002, 0, and 0, respectively.

The data processed with the median filtered approach is stored at the following location in our GitHub Repository.

# Saving a list of cleaned 
save(subject1_medianf, subject2_medianf, subject3_medianf, subject4_medianf,
     subject5_medianf, subject6_medianf, subject7_medianf, subject8_medianf,
     subject9_medianf, subject10_medianf, subject11_medianf, subject12_medianf,
     subject13_medianf, subject14_medianf, subject15_medianf,
     subject1_median_diff_f, subject2_median_diff_f, subject3_median_diff_f,
     subject4_median_diff_f,
     subject5_median_diff_f, subject6_median_diff_f, subject7_median_diff_f,
     subject8_median_diff_f,
     subject9_median_diff_f, subject10_median_diff_f, subject11_median_diff_f,
     subject12_median_diff_f,
     subject13_median_diff_f, subject14_median_diff_f, subject15_median_diff_f,
     file = "./Data/RGenerated/MedianFilteredData.RData")

3.2 Applying the Median Filter Methodology on the CUSUM of the Features

  load(file = "./Data/RGenerated/MedianFilteredData.RData")
  for (i in 1:15) {
    df <- get(paste0("subject",i,"_medianf"))
  means <- colMeans(df)
  df[,2] <- cumsum(df[,2]-means[2])
  df[,3] <- cumsum(df[,3]-means[3])
  df[,4] <- cumsum(df[,4]-means[4])
  assign(paste0("subject",i,"_cusum"), df)  
    
      cat('###',paste0("P", i), "{-}",'\n')
      df_transformed <- melt(get(paste0("subject",i,"_cusum")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
      
      assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("CUSUM of the Median Filtered Data") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") 
         )
      
      print(get(paste0("g",i)))
  
  cat('\n')
  
  cat('<source> <p> The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/CusumData.RData">CusumData.RData</a>. </p> </source>')
  cat('\n \n')
  }

P1

The results from the analysis above can be found at: CusumData.RData.

P2

The results from the analysis above can be found at: CusumData.RData.

P3

The results from the analysis above can be found at: CusumData.RData.

P4

The results from the analysis above can be found at: CusumData.RData.

P5

The results from the analysis above can be found at: CusumData.RData.

P6

The results from the analysis above can be found at: CusumData.RData.

P7

The results from the analysis above can be found at: CusumData.RData.

P8

The results from the analysis above can be found at: CusumData.RData.

P9

The results from the analysis above can be found at: CusumData.RData.

P10

The results from the analysis above can be found at: CusumData.RData.

P11

The results from the analysis above can be found at: CusumData.RData.

P12

The results from the analysis above can be found at: CusumData.RData.

P13

The results from the analysis above can be found at: CusumData.RData.

P14

The results from the analysis above can be found at: CusumData.RData.

P15

The results from the analysis above can be found at: CusumData.RData.

save(subject1_cusum, subject2_cusum, subject3_cusum,
     subject4_cusum, subject5_cusum, subject6_cusum,
     subject7_cusum, subject8_cusum, subject9_cusum,
     subject10_cusum, subject11_cusum, subject12_cusum,
     subject13_cusum, subject14_cusum, subject15_cusum,
       file = "./Data/RGenerated/CusumData.RData")

4 Univariate Changepoint Detection

A common approach for changepoint detection is to project the multivariate problem into a univariate-space using principal component analysis. Then, apply a univariate changepoint detection approach on the problem. We examine this approach in this section.

4.1 Applied to the PCs of the Median Filtered Kinematic Features

When we tested this, we found the following:
(A) The e.divise() produces a lot of changepoints with the deafult sig.level;
(B) Number of changepoints is still large with sig.lvl = 0.001;

For this reason, we do not think that this approach is promising. We will move on to the multivariate approach.

RatingThresh <- 13  # based on Maman et al. 2017
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints 
load(file = "./Data/RGenerated/MedianFilteredData.RData")
pca_subjects <- list()
pca_subjects_loadings <- list()
pc_prop_var_explained <- list()
changepoints_pca_ecp <- list()
changepoints_subj_thr <- list()
changepoints_subj_ecp <- list()
changepoints_cusum_subj_ecp <- list()

for (i in 1:15) {
  cumsum_RPE <- data.frame(matrix(ncol=3, nrow=18)) # need to initialize every time
  colnames(cumsum_RPE) <- c("time.from.start","variable","value")
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
  dataholder <- df_RPE_transformed[correct_subject_rows,]
  # Computing the CUSUMs of the RPEs
  cumsum_RPE[,1] <- dataholder[,1]
  cumsum_RPE[,2] <- paste0(dataholder[,2],".CUSUM")
  cumsum_RPE[,3] <- cumsum(dataholder[,3]-13)
  cumsum_RPE <- na.omit(cumsum_RPE)
  
  # -------- Calculation of Fatigue based on RPE>= 13 -----------------
  fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>% 
                              min() # returns window number
  changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
  df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
  
  #------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
  y <- e.divisive(df_RPE,min.size = 2)$estimates
  fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
  changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
  
  # ------ Repeating the above analysis for the CUSUMs of the RPE --------
  y_cumsum <- e.divisive(cumsum_RPE[,c(1,3)],min.size = 2)$estimates
  fatigue_ecp_CusumRPE <- y_cumsum[2:length(y_cumsum)-1]
  changepoints_cusum_subj_ecp[[i]] <- cumsum_RPE[fatigue_ecp_CusumRPE, 1]
  
  # ------Computing the First Principal Component and the loadings for First PC ---
  df <- paste0("subject",i,"_medianf") %>% get()
  pca_dataholder <-  prcomp(df[,2:4], center=TRUE, scale. = T) # Scale <- Standardize Vars
  pca_subjects_loadings[[i]] <- pca_dataholder$rotation[,1] # Weights for First PC
  # Understanding the % Variation Explained by the First PC
  pc_std <- pca_dataholder$sdev
  pc_var <- pc_std^2
  pc_prop_var_explained[[i]] <- round(pc_var[1]/sum(pc_var),3)
  #  Storing the Data for First PC rotated values
  pca_subjects[[i]] <- data.frame(time.from.start= df[,1], 
                                variable= "PC1.Values",
                                value =pca_dataholder$x[,1]) # First PC
  df_transformed <- pca_subjects[[i]]
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,],
                        cumsum_RPE)
  # The Changepoints for the PCA Data based on e.divisive
  # pca_df <- pca_subjects[[i]]
  # z <- e.divisive(pca_df[,c(1,3)],min.size = 120, sig.lvl = 0.001)$estimates
  # changepoints_pca_ecp[[i]] <- pca_df[z[2:(length(z)-1)],]
  
  cat('###',paste0("P", i), "{-}",'\n')
  assign(paste0("g",i), biplot(pca_dataholder, 
                               scale = 0))
  print(get(paste0("g",i)))
 
  cat("\n")
  # pca_vector <- pca_df[,c(1)]
  
  assign(paste0("h",i),
        ggplot(data = df_transformed,
               aes(x=time.from.start, y=value, colour = variable)) + 
          geom_line(size=1) +
          theme_bw() +
          scale_colour_manual(values = c("maroon","black","grey40")) +
          facet_wrap(~variable,nrow=3, scales = "free_y") +
          # labs(title = paste("A paneled time-series plot for Participant",i)) + 
          # geom_vline(xintercept= pca_vector, 
          #            show.legend = T, size=0.5) + 
          geom_vline(xintercept= changepoints_subj_thr[[i]],
                     linetype="solid", color="steelblue1",
                     show.legend = T, size=2) +
          geom_vline(xintercept= changepoints_subj_ecp[[i]],
                     linetype="dotted", color="gray80",
                     show.legend =T, size=2) +
          geom_vline(xintercept= changepoints_cusum_subj_ecp[[i]],
                     linetype="dashed", color="orange1",
                     show.legend =T, size=3) +
          theme(legend.position="none",
                 plot.title = element_text(hjust = 0.5))
      )
      
    print(get(paste0("h",i)))
  
    cat('\n \n')
    
    # First, the <b> black  lines are used to denote the changepoints based on the non-parameteric, univariate, ECP approach based on the e.divisive(min.size=2) on the First Principal Component of the Kinematic Features. </b> Second
  
    cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following.  First, the <font color= #63B8FF> <b>steelblue line  correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the <b> <font color=#CCCCCC> gray lines  denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'. Third, the <b> <font color=#FFA500> orange lines  are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values. </font> </b> \n \n'))
  
# <source> <p> Based on the <b>ecp package, the number of changepoints for the First PC of the median filtered data for participant ', i,
 #            ' is equal to: ',
             # length(changepoints_pca_ecp[[i]]),
             # '. These changepoints are located at: ',
             # paste(changepoints_pca_ecp[[i]], collapse = ", "),
#             '.  <font color= #63B8FF>    
    
    cat(paste0('<source> <p>  <font color= #63B8FF> <b> The threshold based changepoint in the subjective RPE is located at: ',
             paste(changepoints_subj_thr[[i]]),
             '. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
             length(changepoints_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_subj_ecp[[i]], collapse = ", "),
             ". </font> </b> Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. <b> <font color=#FFA500> The number of these changepoints is equal to: ",
             length(changepoints_cusum_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_cusum_subj_ecp[[i]], collapse = ", "),
             '. </b> </font> The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/PCAChangepointsRPE.RData">PCAChangepointsRPE.RData</a>. </p> </source>')
      )
  
  cat('\n \n')
}

P1

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P2

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 53. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P3

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 35. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 88. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P4

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 2. These changepoints are located at: 0, 53. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P5

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P6

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 76. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P7

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 65. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P8

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P9

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 35, 53, 82. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P10

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 47, 65. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P11

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 100. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P12

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 82. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P13

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P14

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 41. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 41, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

P15

NULL

In the above figure, the colored vertical lines correspond to the following. First, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Second, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Third, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

The threshold based changepoint in the subjective RPE is located at: 29. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 2. These changepoints are located at: 0, 47. The results from the analysis above can be found at: PCAChangepointsRPE.RData.

save(pca_subjects, pca_subjects_loadings, pc_prop_var_explained,
     changepoints_pca_ecp, 
     changepoints_subj_thr, changepoints_subj_ecp, changepoints_cusum_subj_ecp,
     file = "./Data/RGenerated/PCAChangepointsRPE.RData.RData")

5 Multivariate Changepoint Detection

To be filled later.

5.1 The Nonparameteric Approach of Matterson & James (ECP)

In this section, we will examine the use of the approach of Matteson and James (2014) for multiple change point analysis of our trivariate data. Our analysis takes advantage of their R package titled ecp, which is described in more details in the following paper: James and Matteson (2014). The multivariate changepoint method will be first tested using the median filtered data and then using the CUSUM of the median filtered data.

5.1.1 Using the Median Filtered Data

To apply the ecp approach of Matteson and James (2014) on the median filtered data, we need to use a window size (which reflects the smallest size of a shift that is to be detected). From a biomechanics perspective, we expect that the change of performance due to fatigue will be sustained at least for several minutes (until the person adapts and/or rests). Thus, the size of the window (\(\small m_{ecp}\)) should be \(\small m_{ecp} \ge 60\). In our analysis, we tried the following window sizes: 60, 120, 180, 300, and 600. For most of the participants, the number and/or location of change were almost similar. For demonstration purpose, we use \(\small m_{ecp} = 120\) below.

  load(file = "./Data/RGenerated/MedianFilteredData.RData")
  pen <- function(x) -length(x) #Equally penalizes every additional changepoint
  window.size <- 120
  changepoints_medianf <- list()

  for (i in 1:15) {
      df <- get(paste0("subject",i,"_medianf"))
      # Making the DF divisiable by the window size
      # (i.e., we are removing the remainder so that our chunks are equally sized)
      rows.to.keep <- nrow(df[,2:4]) %/% window.size
      remainder <- nrow(df[,2:4]) %% window.size
      df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                      2:4] %>% as.matrix()
      
      # Utilizing the e.agglo() from the ecp package
      mem <- rep(1:rows.to.keep, each=window.size)
      y = e.agglo(X=df.MultivariateChangepoint,
                  member= mem,
                  alpha = 1,
                  penalty = pen)
      fatigue_from_ECP <- y$estimates # Returns observation number
      changepoints_medianf[[i]] <- df[fatigue_from_ECP,1]
      
      cat('####',paste0("P", i), "{-}",'\n')
      df_transformed <- melt(get(paste0("subject",i,"_medianf")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
      
      assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Changepoints for the Median Filtered Data (at vertical black lines)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_vline(xintercept= changepoints_medianf[[i]])
         )
      
      print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for participant ', i,
             ' is equal to: ',
             length(changepoints_medianf[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_medianf[[i]]), collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsMFData.RData">ECPChangePointsMFData.RData</a>. </p> </source>')
      )
  cat('\n \n')
  }

P1

Based on the ecp package, the number of changepoints for participant 1 is equal to: 3. These changepoints are located at: 1718, 6262, 9555. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P2

Based on the ecp package, the number of changepoints for participant 2 is equal to: 2. These changepoints are located at: 408, 6101. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P3

Based on the ecp package, the number of changepoints for participant 3 is equal to: 2. These changepoints are located at: 2761, 6762. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P4

Based on the ecp package, the number of changepoints for participant 4 is equal to: 4. These changepoints are located at: 1, 1521, 6545, 10152. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P5

Based on the ecp package, the number of changepoints for participant 5 is equal to: 2. These changepoints are located at: 3004, 3954. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P6

Based on the ecp package, the number of changepoints for participant 6 is equal to: 4. These changepoints are located at: 2563, 4393, 5694, 9828. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P7

Based on the ecp package, the number of changepoints for participant 7 is equal to: 3. These changepoints are located at: 2, 5625, 6002. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P8

Based on the ecp package, the number of changepoints for participant 8 is equal to: 2. These changepoints are located at: 1372, 5794. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P9

Based on the ecp package, the number of changepoints for participant 9 is equal to: 3. These changepoints are located at: 2, 9171, 10162. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P10

Based on the ecp package, the number of changepoints for participant 10 is equal to: 3. These changepoints are located at: 16, 4930, 10011. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P11

Based on the ecp package, the number of changepoints for participant 11 is equal to: 3. These changepoints are located at: 2, 2030, 10064. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P12

Based on the ecp package, the number of changepoints for participant 12 is equal to: 5. These changepoints are located at: 30, 3889, 4217, 9462, 9610. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P13

Based on the ecp package, the number of changepoints for participant 13 is equal to: 3. These changepoints are located at: 2, 1221, 10005. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P14

Based on the ecp package, the number of changepoints for participant 14 is equal to: 3. These changepoints are located at: 2475, 3949, 8347. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

P15

Based on the ecp package, the number of changepoints for participant 15 is equal to: 4. These changepoints are located at: 2, 6073, 9374, 10228. The results from the analysis above can be found at: ECPChangePointsMFData.RData.

save(changepoints_medianf,
       file = "./Data/RGenerated/ECPChangePointsMFData.RData")

5.1.2 Using the Median Filtered Differenced Data

Here, we apply the ecp approach of Matteson and James (2014) on the median filtered differenced data (as opposed to the median filtered timeseries). For demonstration purpose, we use \(\small m_{ecp} = 120\) below. ** Note that, here, we set the value of \(\small \alpha=2\) as opposed to 1 (elsewhere) since we are looking for changes in the variation instead of the mean. See their paper for more details.

  load(file = "./Data/RGenerated/MedianFilteredData.RData")
  pen <- function(x) -length(x) #Equally penalizes every additional changepoint
  window.size <- 120
  changepoints_medianf_diff <- list()

  for (i in 1:15) {
      df <- get(paste0("subject",i,"_median_diff_f"))
      # Making the DF divisiable by the window size
      # (i.e., we are removing the remainder so that our chunks are equally sized)
      rows.to.keep <- nrow(df[,2:4]) %/% window.size
      remainder <- nrow(df[,2:4]) %% window.size
      df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                      2:4] %>% as.matrix()
      
      # Utilizing the e.agglo() from the ecp package
      mem <- rep(1:rows.to.keep, each=window.size)
      y = e.agglo(X=df.MultivariateChangepoint,
                  member= mem,
                  alpha = 2,
                  penalty = pen)
      fatigue_from_ECP <- y$estimates
      changepoints_medianf_diff[[i]] <- df[fatigue_from_ECP,1]
      
      cat('####',paste0("P", i), "{-}",'\n')
      df_transformed <- melt(get(paste0("subject",i,"_median_diff_f")),
                         id.vars = "diff.time.from.start",
                         measure.vars=c("diff.scaled.stride.len",
                                        "diff.scaled.stride.height",
                                        "diff.stride.duration"
                                        )
                         ) # ggplot data needs to be tall
      
      assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=diff.time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Changepoints for the Median Filtered Differenced Data (at vertical black lines)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_vline(xintercept= changepoints_medianf_diff[[i]])
         )
      
      print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for the median filtered differenced data for participant ', i,
             ' is equal to: ',
             length(changepoints_medianf_diff[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_medianf_diff[[i]]), collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsDiffMFData.RData">ECPChangePointsDiffMFData.RData</a>. </p> </source>')
      )
  
  cat('\n \n')

  }

P1

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 1 is equal to: 1. These changepoints are located at: 2946. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P2

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 2 is equal to: 1. These changepoints are located at: 9171. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P3

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 3 is equal to: 1. These changepoints are located at: 9401. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P4

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 4 is equal to: 1. These changepoints are located at: 4174. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P5

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 5 is equal to: 1. These changepoints are located at: 3341. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P6

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 6 is equal to: 1. These changepoints are located at: 9996. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P7

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 7 is equal to: 1. These changepoints are located at: 4452. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P8

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 8 is equal to: 1. These changepoints are located at: 8830. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P9

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 9 is equal to: 1. These changepoints are located at: 4584. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P10

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 10 is equal to: 1. These changepoints are located at: 4931. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P11

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 11 is equal to: 1. These changepoints are located at: 1716. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P12

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 12 is equal to: 1. These changepoints are located at: 3890. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P13

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 13 is equal to: 2. These changepoints are located at: 3, 10006. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P14

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 14 is equal to: 1. These changepoints are located at: 3950. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

P15

Based on the ecp package, the number of changepoints for the median filtered differenced data for participant 15 is equal to: 1. These changepoints are located at: 6412. The results from the analysis above can be found at: ECPChangePointsDiffMFData.RData.

save(changepoints_medianf_diff,
       file = "./Data/RGenerated/ECPChangePointsDiffMFData.RData")

5.1.3 Using the CUSUM of the Median Filtered Data

Here, we apply the ecp approach of Matterson and James (2014) on the cusums of the median filtered data (as opposed to the median filtered timeseries). For demonstration purpose, we use \(\small m_{ecp} = 120\) below.

load(file = "./Data/RGenerated/CusumData.RData")
  pen <- function(x) -length(x) #Equally penalizes every additional changepoint
  window.size <- 120
  changepoints_cusum <- list()

  for (i in 1:15) {
      df <- get(paste0("subject",i,"_cusum"))
      # Making the DF divisiable by the window size
      # (i.e., we are removing the remainder so that our chunks are equally sized)
      rows.to.keep <- nrow(df[,2:4]) %/% window.size
      remainder <- nrow(df[,2:4]) %% window.size
      df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                      2:4] %>% as.matrix()
      
      # Utilizing the e.agglo() from the ecp package
      mem <- rep(1:rows.to.keep, each=window.size)
      y = e.agglo(X=df.MultivariateChangepoint,
                  member= mem,
                  alpha = 1,
                  penalty = pen)
      fatigue_from_CUSUM_ECP <- y$estimates
      changepoints_cusum[[i]] <- df[fatigue_from_CUSUM_ECP,1]
      
      cat('####',paste0("P", i), "{-}",'\n')
      df_transformed <- melt(get(paste0("subject",i,"_cusum")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
      
      assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Changepoints for the CUSUMs of the Median Filtered Data (vertical black lines)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_vline(xintercept= changepoints_cusum[[i]])
         )
      
      print(get(paste0("g",i)))
  
  cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>ecp package</b>, the number of changepoints for the cusums of the median filtered data for participant ', i,
             ' is equal to: ',
             length(changepoints_cusum[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_cusum[[i]]), collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ECPChangePointsCUSUM.RData">ECPChangePointsCUSUM.RData</a>. </p> </source>')
      )
  cat('\n \n')
  }

P1

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 3. These changepoints are located at: 2, 7515, 10193. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P2

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 4563, 7486. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P3

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 3. These changepoints are located at: 1376, 3799, 9664. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P4

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 4. These changepoints are located at: 822, 4868, 6545, 9770. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P5

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 1355, 3335. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P6

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 2. These changepoints are located at: 163, 8285. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P7

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 4. These changepoints are located at: 2, 1179, 2923, 6002. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P8

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 5. These changepoints are located at: 2, 1962, 4769, 6230, 10017. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P9

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 2. These changepoints are located at: 6246, 9529. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P10

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 2. These changepoints are located at: 1385, 8355. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P11

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 2. These changepoints are located at: 1404, 8618. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P12

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 4. These changepoints are located at: 4080, 5331, 8617, 9462. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P13

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 4. These changepoints are located at: 502, 2179, 5996, 9065. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P14

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 1427, 3256, 6718. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

P15

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 2. These changepoints are located at: 2137, 7804. The results from the analysis above can be found at: ECPChangePointsCUSUM.RData.

save(changepoints_cusum,
       file = "./Data/RGenerated/ECPChangePointsCUSUM.RData")

5.2 The Approach of Capizzi and Masarotto (dfphase1)

To compare the performance of the ECP approach, we also investigated using the approach of Cappizi and Masarotto (2016) for another nonparameteric multivariate changepoint analysis method. Similar to the ecp approach, we applied their methodology on the: (a) median filtered data; and (b) cusums of the median filtered data. The results are shown in the following subsections. Note that we did not perform the analysis on the differences since we concluded that it was less informative than (a) and (b) based on our conclusions from Section 4.1.

For the parameterization of the Cappizi and Masarotto (2016) approach, we used the default parameters of the mphase1() function from the dfphase1 package with the exception of:
isolated = FALSE since we do not expect shifts to last only 120 strides.
plot = FALSE since we would like to use ggplot package instead of the default plot function.

Notes:

  1. Similar to the analysis in Section 4.1, we are using the window size of 120 (\(\small m_{dphase1}=120\)). This allows us to compare the results (in terms of the number of detected changepoints and whether these match better from a graphical perspective).

  2. Note that we have used the default \(\small lmin=5\), which means that the mphase1 method will ignore any changepoints within 5 windows of the previously detected changepoint. While this is different than the ECP method, we still have more changepoints (and thus, we have elected against changing this default setting).

5.2.1 Using the Median Filtered Data

  load(file = "./Data/RGenerated/MedianFilteredData.RData")
  changepoints_mph1_mf <- list()

  for (i in 1:15) {
    df <- get(paste0("subject",i,"_medianf"))
    window.size <- 120
    rows.to.keep <- nrow(df[,2:4]) %/% window.size
    remainder <- nrow(df[,2:4]) %% window.size
    df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                    2:4]
    dataholder.array <- t(df.MultivariateChangepoint) %>% 
                          array(c(3, window.size, rows.to.keep))
    y <- mphase1(dataholder.array,
                 isolated=FALSE, plot = FALSE, alpha=0.001)
    mcp <- (df[unlist(y$alasso[2]),1]*window.size) %>% round(digits = 0)
    mcp <- mcp[order(mcp)]
    mcp <- (mcp - window.size) + 1 # to make the change point at the begining of the period and not end (to be similar to the ecp method)
    changepoints_mph1_mf[[i]] <- y
    
     cat('####',paste0("P", i), "{-}",'\n')
    df_transformed <- melt(get(paste0("subject",i,"_medianf")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                         )
  ) # ggplot data needs to be tall
  
  assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Changepoints for the Median Filtered Data (w/ dfphase1 package)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_vline(xintercept= mcp)
  )
  
  print(get(paste0("g",i)))
  
 cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>dfphase1 package</b>, the number of changepoints for the cusums of the median filtered data for participant ', i,
             ' is equal to: ',
             length(mcp),
             '. These changepoints are located at: ',
             paste(mcp, collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangePointsMedianFilterMphase1.RData">ChangePointsMedianFilterMphase1.RData</a>. </p> </source>')
  )
  cat('\n \n')
  }

P1

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 4. These changepoints are located at: 1270, 8122, 9121, 10231. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P2

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 5. These changepoints are located at: 1110, 2478, 3768, 4652, 5309. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P3

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 1. These changepoints are located at: 1015. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P4

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 8. These changepoints are located at: 773, 1513, 3988, 6087, 7567, 8201, 9036, 9774. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P5

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 3. These changepoints are located at: 2345, 3122, 4152. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P6

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 6. These changepoints are located at: 720, 1592, 2474, 3346, 4355, 5110. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P7

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 3. These changepoints are located at: 999, 1972, 2934. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P8

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 8. These changepoints are located at: 997, 1843, 3503, 5517, 6493, 7204, 8277, 8991. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P9

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 5. These changepoints are located at: 774, 1951, 2882, 4274, 7502. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P10

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 7. These changepoints are located at: 2615, 3485, 4326, 5174, 5990, 6808, 7760. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P11

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 6. These changepoints are located at: 1217, 2562, 5458, 7088, 8582, 9319. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P12

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 6. These changepoints are located at: 4668, 5336, 6286, 6952, 7934, 8628. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P13

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 6. These changepoints are located at: 2371, 4082, 5069, 5896, 7150, 7983. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P14

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 4. These changepoints are located at: 1134, 1914, 7931, 9336. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

P15

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 6. These changepoints are located at: 1351, 2071, 3543, 4415, 6939, 9092. The results from the analysis above can be found at: ChangePointsMedianFilterMphase1.RData.

  save(changepoints_mph1_mf,
       file = "./Data/RGenerated/ChangePointsMedianFilterMphase1.RData")

5.2.2 Using the CUSUM Data

  load(file = "./Data/RGenerated/CusumData.RData")
  changepoints_mph1_cusum <- list()

  for (i in 1:15) {
    df <- get(paste0("subject",i,"_cusum"))
    window.size <- 120
    rows.to.keep <- nrow(df[,2:4]) %/% window.size
    remainder <- nrow(df[,2:4]) %% window.size
    df.MultivariateChangepoint<- df[1:(nrow(df)-remainder),
                                    2:4]
    dataholder.array <- t(df.MultivariateChangepoint) %>% 
                          array(c(3, window.size, rows.to.keep))
    y <- mphase1(dataholder.array,
                 isolated=FALSE, plot = FALSE, alpha=0.001)
    mcp <- (df[unlist(y$alasso[2]),1]*window.size) %>% round(digits = 0)
    mcp <- mcp[order(mcp)]
    mcp <- (mcp - window.size) + 1 # to make the change point at the begining of the period and not end (to be similar to the ecp method)
    changepoints_mph1_cusum[[i]] <- y
    
     cat('####',paste0("P", i), "{-}",'\n')
  df_transformed <- melt(get(paste0("subject",i,"_cusum")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                         )
  ) # ggplot data needs to be tall
  
  assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Changepoints for the CUSUMs of the Median Filtered Data (w/ dfphase1 package)") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y") +
           geom_vline(xintercept= mcp)
  )
  
  print(get(paste0("g",i)))
  
 cat('\n')
  
  cat(paste0('<source> <p> Based on the <b>dfphase1 package</b>, the number of changepoints for the cusums of the median filtered data for participant ', i,
             ' is equal to: ',
             length(mcp),
             '. These changepoints are located at: ',
             paste(mcp, collapse = ", "),
             '. The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangePointsCusumMphase1.RData">ChangePointsCusumMphase1.RData</a>. </p> </source>')
  )
  cat('\n \n')
  }

P1

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 5. These changepoints are located at: 1270, 7619, 8502, 9369, 10231. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P2

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 4. These changepoints are located at: 1110, 2048, 3548, 5091. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P3

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 1. These changepoints are located at: 1015. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P4

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 7. These changepoints are located at: 773, 3604, 5057, 7242, 8095, 9036, 9671. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P5

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 2474, 3504. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P6

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 5. These changepoints are located at: 1094, 2092, 3346, 4226, 5110. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P7

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 2. These changepoints are located at: 1693, 3204. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P8

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 7. These changepoints are located at: 753, 1588, 5130, 5894, 6848, 7561, 8636. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P9

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 5. These changepoints are located at: 1032, 1819, 3016, 4274, 7936. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P10

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 6. These changepoints are located at: 3237, 3973, 4687, 5874, 6693, 7524. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P11

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 6. These changepoints are located at: 1461, 2193, 4955, 6959, 7833, 8706. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P12

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 5. These changepoints are located at: 4327, 5116, 6399, 7707, 8740. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P13

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 6. These changepoints are located at: 1461, 3265, 4501, 6035, 7427, 8259. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P14

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 5. These changepoints are located at: 1246, 1914, 3549, 8648, 9336. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

P15

Based on the dfphase1 package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 7. These changepoints are located at: 984, 1951, 2806, 3666, 4675, 7203, 8961. The results from the analysis above can be found at: ChangePointsCusumMphase1.RData.

  save(changepoints_mph1_cusum,
       file = "./Data/RGenerated/ChangePointsCusumMphase1.RData")

5.3 Comments on the ECP and dfphase1 Methods

Based on the results in Sections 4.1 and 4.2, the approach of Matterson and James (2016) is preferred since it:

  1. Results in a more limited number of changepoints, highlighting more (practically) significant changes in the gait parameters. This result was somewhat surprising since we used the default settings for the approach of Capizzi and Masarotto (2016) throught the default mphase1(), which prevents having changepoints within 5 windows. Note that this result holds true even when we adjusted the \(\small \alpha =0.001\) instead of the default of \(\small \alpha =0.05\):) for the mphase1() method (without adjusting the ecp() method since it does not allow for adjusting the false alarm rate); and

  2. Is more computationally efficient. It had a shorter run time than the other approach.

In addition, from the analysis, the changepoints based on the CUSUM of the median filtered data are the most insightful. Specifically, one can discern how each participant’s performance change as the experiment goes on. For example, P10 has a clear fatigued performance in the end, where the cusums of the stride length and stride height are decreasing rapidly after stride 5521.

6 Comparing the ECP Changepoints with Participants’ Subjective Ratings

Throughout the experiments, participants provided their subjective ratings every 10 minutes. The ratings were based on the Borg 6-20 RPE scale. Recent literature on fatigue detection suggests that a rating of 13 can be used to estimate the onset of fatigue (see Maman et al. (2017) for a detailed analysis). The subjective ratings for each participant can be accessed at: subjective-ratings-raw.

6.1 Plots with the Median Filtered Data and their ECP Changepoints

In this section, we perform the following tasks:

  1. EDA, where we overlay the subjective ratings with the Median Filtered changepoints;

  2. A threshold based comparison, where we use \(\small RPE \ge 13\) to identify when a participant becomes fatigued and we compare that to the changepoints detected by the hybrid CUSUM-EPC approach; and

  3. Utilize the e.divisive() from the ecp package to obtain the changepoints for the Borg Ratings. Note that this function is the univariate approach for the multiple changepoint method examined in Section 4.1.

RatingThresh <- 13  # based on Maman et al. 2017 
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints 
load(file = "./Data/RGenerated/MedianFilteredData.RData")
load(file = "./Data/RGenerated/ECPChangePointsMFData.RData")
changepoints_subj_ecp <- list()
changepoints_subj_thr <- list()
changepoints_cusum_subj_ecp <- list()

for (i in 1:15) {
  cumsum_RPE <- data.frame(matrix(ncol=3, nrow=18)) # need to initialize every time
  colnames(cumsum_RPE) <- c("time.from.start","variable","value")
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
  dataholder <- df_RPE_transformed[correct_subject_rows,]
  # Computing the CUSUMs of the RPEs
  cumsum_RPE[,1] <- dataholder[,1]
  cumsum_RPE[,2] <- paste0(dataholder[,2],".CUSUM")
  cumsum_RPE[,3] <- cumsum(dataholder[,3]-13)
  cumsum_RPE <- na.omit(cumsum_RPE)
  
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,],
                            cumsum_RPE)
  
  # -------- Calculation of Fatigue based on RPE>= 13 -----------------
  fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>% 
                              min() # returns window number
  changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
  df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
  
  #------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
  y <- e.divisive(df_RPE,min.size = 2)$estimates
  fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
  changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
  
  # ------ Repeating the above analysis for the CUSUMs of the RPE --------
  y_cumsum <- e.divisive(cumsum_RPE[,c(1,3)],min.size = 2)$estimates
  fatigue_ecp_CusumRPE <- y_cumsum[2:length(y_cumsum)-1]
  changepoints_cusum_subj_ecp[[i]] <- cumsum_RPE[fatigue_ecp_CusumRPE, 1]
  
  cat('###',paste0("P", i), "{-}",'\n')
  df_transformed <- melt(get(paste0("subject",i,"_medianf")),
                       id.vars = "time.from.start",
                       measure.vars=c("scaled.stride.len",
                                      "scaled.stride.height",
                                      "stride.duration"
                                      )
                         )
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,],
                            cumsum_RPE)
      
  assign(paste0("g",i),
        ggplot(data = df_transformed,
               aes(x=time.from.start, y=value, colour = variable)) + 
          geom_line(size=1) +
          theme_bw() +
          scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
          facet_wrap(~variable,nrow=5, scales = "free_y") +
          # labs(title = paste("A paneled time-series plot for Participant",i)) + 
          geom_vline(xintercept= changepoints_medianf[[i]], 
                     show.legend = T, size=2) + 
          geom_vline(xintercept= changepoints_subj_thr[[i]],
                     linetype="solid", color="steelblue1",
                     show.legend = T, size=2) +
          geom_vline(xintercept= changepoints_subj_ecp[[i]],
                     linetype="dotted", color="gray80",
                     show.legend =T, size=2) +
          geom_vline(xintercept= changepoints_cusum_subj_ecp[[i]],
                     linetype="dashed", color="orange1",
                     show.legend =T, size=3) +
          theme(legend.position="none",
                 plot.title = element_text(hjust = 0.5))
      )
      
      print(get(paste0("g",i)))
  
  cat('\n \n')
  
  cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following.  First, the <b> black  lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. </b> These changepoints are what we obtained in Section 4.1.3. Second, the <font color= #63B8FF> <b>steelblue line  correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the <b> <font color=#CCCCCC> gray lines  denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'. Fourth, the <b> <font color=#FFA500> orange lines  are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values. </font> </b> \n \n'))
  
  cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the the median filtered data for participant ', i,
             ' is equal to: ',
             length(changepoints_medianf[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_medianf[[i]]), collapse = ", "),
             '.  <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
             paste(changepoints_subj_thr[[i]]),
             '. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
             length(changepoints_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_subj_ecp[[i]], collapse = ", "),
             ". </font> </b> Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. <b> <font color=#FFA500> The number of these changepoints is equal to: ",
             length(changepoints_cusum_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_cusum_subj_ecp[[i]], collapse = ", "),
             '. </b> </font> The results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangepointsRPE.RData">ChangepointsRPE.RData</a>. </p> </source>')
      )
  
  cat('\n \n')
  
  }

P1

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 1 is equal to: 3. These changepoints are located at: 1718, 6262, 9555. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P2

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 408, 6101. The threshold based changepoint in the subjective RPE is located at: 53. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P3

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 3 is equal to: 2. These changepoints are located at: 2761, 6762. The threshold based changepoint in the subjective RPE is located at: 35. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

P4

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 4 is equal to: 4. These changepoints are located at: 1, 1521, 6545, 10152. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 2. These changepoints are located at: 0, 53. The results from the analysis above can be found at: ChangepointsRPE.RData.

P5

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 3004, 3954. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P6

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 6 is equal to: 4. These changepoints are located at: 2563, 4393, 5694, 9828. The threshold based changepoint in the subjective RPE is located at: 76. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P7

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 7 is equal to: 3. These changepoints are located at: 2, 5625, 6002. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 65. The results from the analysis above can be found at: ChangepointsRPE.RData.

P8

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 8 is equal to: 2. These changepoints are located at: 1372, 5794. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P9

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 9 is equal to: 3. These changepoints are located at: 2, 9171, 10162. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 35, 53, 82. The results from the analysis above can be found at: ChangepointsRPE.RData.

P10

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 10 is equal to: 3. These changepoints are located at: 16, 4930, 10011. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 2. These changepoints are located at: 0, 65. The results from the analysis above can be found at: ChangepointsRPE.RData.

P11

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 11 is equal to: 3. These changepoints are located at: 2, 2030, 10064. The threshold based changepoint in the subjective RPE is located at: 100. In addition, the number of changepoints based on the RPE values for this subject is equal to: 2. These changepoints are located at: 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P12

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 12 is equal to: 5. These changepoints are located at: 30, 3889, 4217, 9462, 9610. The threshold based changepoint in the subjective RPE is located at: 82. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P13

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 13 is equal to: 3. These changepoints are located at: 2, 1221, 10005. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P14

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 2475, 3949, 8347. The threshold based changepoint in the subjective RPE is located at: 41. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. The results from the analysis above can be found at: ChangepointsRPE.RData.

P15

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the the median filtered data for participant 15 is equal to: 4. These changepoints are located at: 2, 6073, 9374, 10228. The threshold based changepoint in the subjective RPE is located at: 29. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 88. The results from the analysis above can be found at: ChangepointsRPE.RData.

save(changepoints_subj_ecp, changepoints_subj_thr, changepoints_cusum_subj_ecp,
     file = "./Data/RGenerated/ChangepointsRPE.RData")

6.2 Plots with the CUSUM of the Median Filtered Data and the ECP-CUSUM Changepoints

In this section, we perform the following tasks:

  1. EDA, where we overlay the subjective ratings with the CUSUM changepoints;

  2. A threshold based comparison, where we use \(\small RPE \ge 13\) to identify when a participant becomes fatigued and we compare that to the changepoints detected by the hybrid CUSUM-EPC approach; and

  3. Utilize the e.divisive() from the ecp package to obtain the changepoints for the Borg Ratings. Note that this function is the univariate approach for the multiple changepoint method examined in Section 4.1.

RatingThresh <- 13  # based on Maman et al. 2017 
RPE = read.csv("https://raw.githubusercontent.com/fmegahed/fatigue-changepoint/master/Data/Raw/RPE.csv") # to read subjective rating of RPE data from GitHub'
colnames(RPE)[1] <- "time.from.start"
df_RPE_transformed <- melt(RPE,
                           id.vars = "time.from.start",
                           measure.vars = colnames(RPE)[2:16])
df_RPE_transformed[,2] <- paste0(df_RPE_transformed[,2],".RPE")
# to load previously generated data and changepoints 
load(file = "./Data/RGenerated/CusumData.RData")
load(file = "./Data/RGenerated/ECPChangePointsCUSUM.RData")
changepoints_subj_ecp <- list()
changepoints_subj_thr <- list()
changepoints_cusum_subj_ecp <- list()

for (i in 1:15) {
  cumsum_RPE <- data.frame(matrix(ncol=3, nrow=18)) # need to initialize every time
  colnames(cumsum_RPE) <- c("time.from.start","variable","value")
  correct_subject_rows <- which(df_RPE_transformed[,2]==paste0("Subject.",i,".RPE"))
  dataholder <- df_RPE_transformed[correct_subject_rows,]
  # Computing the CUSUMs of the RPEs
  cumsum_RPE[,1] <- dataholder[,1]
  cumsum_RPE[,2] <- paste0(dataholder[,2],".CUSUM")
  cumsum_RPE[,3] <- cumsum(dataholder[,3]-13)
  cumsum_RPE <- na.omit(cumsum_RPE)
  
  
  # -------- Calculation of Fatigue based on RPE>= 13 -----------------
  fatigue_from_RPE_threshold <- which(RPE[,i+1]>=RatingThresh) %>% 
                              min() # returns window number
  changepoints_subj_thr[[i]] <- RPE[fatigue_from_RPE_threshold,1]
  df_RPE<-data.frame(d1=RPE[1],t1=RPE[i+1]) %>% na.omit() # to make dataframe of subjective ratings for ggplot
  
  #------- Estimation of Fatigue Changepoints based on ECP of the RPE Data --
  y <- e.divisive(df_RPE,min.size = 2)$estimates
  fatigue_from_ecp_changepoint = y[2:(length(y)-1)] # removing the first and last points as trivials changepoints
  changepoints_subj_ecp[[i]] <- df_RPE[fatigue_from_ecp_changepoint,1]
  
  # ------ Repeating the above analysis for the CUSUMs of the RPE --------
  y_cumsum <- e.divisive(cumsum_RPE[,c(1,3)],min.size = 2)$estimates
  fatigue_ecp_CusumRPE <- y_cumsum[2:length(y_cumsum)-1]
  changepoints_cusum_subj_ecp[[i]] <- cumsum_RPE[fatigue_ecp_CusumRPE, 1]
   
  cat('###',paste0("P", i), "{-}",'\n')
  df_transformed <- melt(get(paste0("subject",i,"_cusum")),
                       id.vars = "time.from.start",
                       measure.vars=c("scaled.stride.len",
                                      "scaled.stride.height",
                                      "stride.duration"
                                      )
                         )
  df_transformed <- rbind(df_transformed,df_RPE_transformed[correct_subject_rows,],
                            cumsum_RPE)
  
  assign(paste0("g",i),
        ggplot(data = df_transformed,
               aes(x=time.from.start, y=value, colour = variable)) + 
          geom_line(size=1) +
          theme_bw() +
          scale_colour_manual(values = c("red", "green", "blue","black","grey40")) +
          facet_wrap(~variable,nrow=5, scales = "free_y") +
          # labs(title = paste("A paneled time-series plot for Participant",i)) + 
          geom_vline(xintercept= changepoints_cusum[[i]], 
                     show.legend = T, size=2) + 
          geom_vline(xintercept= changepoints_subj_thr[[i]],
                     linetype="solid", color="steelblue1",
                     show.legend = T, size=2) +
          geom_vline(xintercept= changepoints_subj_ecp[[i]],
                     linetype="dotted", color="gray80",
                     show.legend =T, size=2) +
          geom_vline(xintercept= changepoints_cusum_subj_ecp[[i]],
                     linetype="dashed", color="orange1",
                     show.legend =T, size=3) +
          theme(legend.position="none",
                 plot.title = element_text(hjust = 0.5))
      )
      
      print(get(paste0("g",i)))
  
  cat('\n \n')
  
  cat(paste0('<source> <p> In the above figure, the colored vertical lines correspond to the following.  First, the <b> black  lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. </b> These changepoints are what we obtained in Section 4.1.3. Second, the <font color= #63B8FF> <b>steelblue line  correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. </b> </font> Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the <b> <font color=#CCCCCC> gray lines  denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. </font> </b>',
'. Fourth, the <b> <font color=#FFA500> orange lines  are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values. </font> </b> \n \n'))
  
  cat(paste0('<source> <p> Based on the <b>ecp package, the number of changepoints for the cusums of the median filtered data for participant ', i,
             ' is equal to: ',
             length(changepoints_cusum[[i]]),
             '. These changepoints are located at: ',
             paste(round(changepoints_cusum[[i]]), collapse = ", "),
             '.  <font color= #63B8FF> The threshold based changepoint in the subjective RPE is located at: ',
             paste(changepoints_subj_thr[[i]]),
             '. </font> </b> In addition, <b> <font color=#CCCCCC> the number of changepoints based on the RPE values for this subject is equal to: ',
             length(changepoints_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_subj_ecp[[i]], collapse = ", "),
             ". </font> </b> Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. <b> <font color=#FFA500> The number of these changepoints is equal to: ",
             length(changepoints_cusum_subj_ecp[[i]]),
             ". These changepoints are located at: ",
             paste(changepoints_cusum_subj_ecp[[i]], collapse = ", "),
             '. </b> </font> Note that the results from the analysis above can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/ChangepointsRPE.RData">ChangepointsRPE.RData</a>. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]). </p> </source>')
      )
  
  cat('\n \n')
  
  }

P1

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 1 is equal to: 3. These changepoints are located at: 2, 7515, 10193. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P2

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 2 is equal to: 2. These changepoints are located at: 4563, 7486. The threshold based changepoint in the subjective RPE is located at: 53. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P3

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 3 is equal to: 3. These changepoints are located at: 1376, 3799, 9664. The threshold based changepoint in the subjective RPE is located at: 35. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P4

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 4 is equal to: 4. These changepoints are located at: 822, 4868, 6545, 9770. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 35, 53, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P5

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 5 is equal to: 2. These changepoints are located at: 1355, 3335. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P6

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 6 is equal to: 2. These changepoints are located at: 163, 8285. The threshold based changepoint in the subjective RPE is located at: 76. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P7

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 7 is equal to: 4. These changepoints are located at: 2, 1179, 2923, 6002. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 47. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 2. These changepoints are located at: 0, 65. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P8

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 8 is equal to: 5. These changepoints are located at: 2, 1962, 4769, 6230, 10017. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 1. These changepoints are located at: 41. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P9

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 9 is equal to: 2. These changepoints are located at: 6246, 9529. The threshold based changepoint in the subjective RPE is located at: 23. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 35, 53, 82. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P10

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 10 is equal to: 2. These changepoints are located at: 1385, 8355. The threshold based changepoint in the subjective RPE is located at: 18. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 65. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P11

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 11 is equal to: 2. These changepoints are located at: 1404, 8618. The threshold based changepoint in the subjective RPE is located at: 100. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P12

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 12 is equal to: 4. These changepoints are located at: 4080, 5331, 8617, 9462. The threshold based changepoint in the subjective RPE is located at: 82. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P13

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 13 is equal to: 4. These changepoints are located at: 502, 2179, 5996, 9065. The threshold based changepoint in the subjective RPE is located at: 59. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P14

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 14 is equal to: 3. These changepoints are located at: 1427, 3256, 6718. The threshold based changepoint in the subjective RPE is located at: 41. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 41, 65, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 3. These changepoints are located at: 0, 35, 70. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

P15

In the above figure, the colored vertical lines correspond to the following. First, the black lines are used to denote the changepoints based on the non-parameteric, multivariate, ECP approach on the CUSUMS of the Kinematic Features. These changepoints are what we obtained in Section 4.1.3. Second, the steelblue line correspond to the time from start in seconds, when the subject has a RPE that equals or exceeds 13. Note that we only indicate the first occurence when the RPE>=13 on the graphs. Third, the gray lines denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the RPE values from the subject. . Fourth, the orange lines are used to denote the univariate, nonparametric changepoints obtained from applying the e.divisive() on the CUSUM of the RPE values.

Based on the ecp package, the number of changepoints for the cusums of the median filtered data for participant 15 is equal to: 2. These changepoints are located at: 2137, 7804. The threshold based changepoint in the subjective RPE is located at: 29. In addition, the number of changepoints based on the RPE values for this subject is equal to: 3. These changepoints are located at: 23, 47, 88. Similar to our previous changepoint analysis, we also obtained the changepoints for the CUSUMs of the RPEs. The number of these changepoints is equal to: 4. These changepoints are located at: 0, 23, 47, 88. Note that the results from the analysis above can be found at: ChangepointsRPE.RData. This is the same link as that provided in the previous section since the RPEresults are identical. The change here was overlaying the RPE results with the CUSUM of the median filtered data and their corresponding changepoints from the ECP package (instead of directly with the median filtered data and their changepoints [also from the ECP package]).

save(changepoints_subj_thr,changepoints_subj_ecp, changepoints_cusum_subj_ecp,
       file = "./Data/RGenerated/ChangepointsRPE.RData")

7 Time-Series Clustering

7.1 Using %Time-from-Start to Standardize the Length of Each Participant’s Time-series

In this section, we rescale/normalize the timescale a 0-100% scale for all participants to prepare for a timeseries clustering type analysis. Specifically, we sampled the three features from the data based on a 0.05% from start. This resulted in a standardization of the number of observations within each timeseries across our 15 paricipants. Based on our increment size, each timeseries is now comprised of 2000 observations, each at 0.05% time increment from its neighouring data points.

# to load previously generated data 
load(file = "./Data/RGenerated/FeatureGeneration.RData")

medianfilt_norm <- list() 
cusum_norm <- list()

for (i in 1:15) {
  
  df <- get(paste0("subject",i,"_features")) # Getting Data from Sec.
  
  # resampling the data to 2000 samples
  nsamples = 2000
  timePerc <- seq(0, 100, by=0.05)
  timeScale <- df[nrow(df),1]/nsamples
  df_norm = df[FALSE,]
  
  for (j in 1:nsamples) {
    df_matchIsx <- match.closest(timeScale*(j-0.5),df[,1]) # finding the closest point
    df_keep <- df[df_matchIsx,]
    df_keep[,1] <- timePerc[j]
    df_norm <- rbind(df_norm, df_keep)
  }
  assign(paste0("subject",i,"_features_norm"), df_norm)
  
  # median filter  
  df <- get(paste0("subject",i,"_features_norm")) # Getting Data from Sec.
  # Obtaining the Outliers for each of the three variables
  time.from.start <- df[,1]
  scaled.stride.len <- medianFilter(df[,2], order = 29) # moving window of size 29 obs
  scaled.stride.height <- medianFilter(df[,3], order = 29) # moving window of size 29 obs
  stride.duration <- medianFilter(df[,4], order = 29) # moving window of size 29 obs
  assign(paste0("subject",i,"_medianf_norm"), 
         data.frame(cbind(time.from.start, scaled.stride.len, 
                          scaled.stride.height, stride.duration)))
  medianfilt_norm[[i]] <- get(paste0("subject",i,"_medianf_norm"))

  
  # cusum calculation
  df <- get(paste0("subject",i,"_medianf_norm"))
  means <- colMeans(df)
  df[,2] <- cumsum(df[,2]-means[2])
  df[,3] <- cumsum(df[,3]-means[3])
  df[,4] <- cumsum(df[,4]-means[4])
  assign(paste0("subject",i,"_cusum_norm"), df)  
  cusum_norm[[i]] <- get(paste0("subject",i,"_cusum_norm"))
  
  
   # Preparing the data for the Line Graph
  df_transformed <- melt(get(paste0("subject",i,"_features_norm")),
                         id.vars = "time.from.start",
                         measure.vars=c("scaled.stride.len",
                                        "scaled.stride.height",
                                        "stride.duration"
                                        )
                         ) # ggplot data needs to be tall
  
  median_df_transformed <- melt(get(paste0("subject",i,"_medianf_norm")),
                                 id.vars = "time.from.start",
                                 measure.vars=c("scaled.stride.len",
                                                "scaled.stride.height",
                                                "stride.duration"
                                                )
                                 ) # ggplot data needs to be tall
          
  cusum_df_transformed  <- melt(get(paste0("subject",i,"_cusum_norm")),
                                id.vars = "time.from.start",
                                measure.vars=c("scaled.stride.len",
                                               "scaled.stride.height",
                                               "stride.duration"
                                              )
                               ) # ggplot data needs to be tall

  assign(paste0("g",i),
         ggplot(data = df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle(paste0("Participant ",i,": Pre-filtered Resampled Data")) + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  
   assign(paste0("h",i),
         ggplot(data =  median_df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Resampled data post the application of the median filter") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  
  assign(paste0("I",i),
         ggplot(data = cusum_df_transformed,
                aes(x=time.from.start, y=value, group=variable,
                    color=variable,shape=variable)) + 
           geom_line() + theme_bw() + 
           ggtitle("Resampled CUSUM of the data post the application of the median filter") + 
           theme(legend.position="none",
                 #axis.text.x=element_text(angle=90,hjust=1),
                 plot.title = element_text(hjust = 0.5)) +
           facet_wrap(~variable,nrow=3, scales = "free_y")
         )
  
  
  cat('###',paste0("P", i), "{-}",'\n')
  
  print(get(paste0("g",i)))
  cat('\n')
  
  print(get(paste0("h",i)))
  cat('\n')
  
  print(get(paste0("I",i)))
  cat('\n')
  
  cat('\n \n')

}

P1

P2

P3

P4

P5

P6

P7

P8

P9

P10

P11

P12

P13

P14

P15

  cat(paste0('</b> </font> The normalized data can be found at: <a href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/NormMedianFilteredData.RData">NormMedianFilteredData.RData</a> and <a 
href="https://github.com/fmegahed/fatigue-changepoint/blob/master/Data/RGenerated/NormCusumData.RData">NormCusumData.RData</a> </p> </source>')
  )
The normalized data can be found at: NormMedianFilteredData.RData and NormCusumData.RData

save(medianfilt_norm,
     file = "./Data/RGenerated/NormMedianFilteredData.RData")

save(cusum_norm,
     file = "./Data/RGenerated/NormCusumData.RData")

7.2 Time-Series Clustering

To perform time-series clustering, we can do the following:

  1. univariate time-series clustering:
    • using the first PC of the scaled and then median filtered data; or
    • using the first PC of the scaled, median filtered and cusumed data;
  2. multivariate time-series clustering:
    • using the median filtered scaled data;
    • using the scaled, median filtered and then cusumed data.

We show the results from each analysis in the following subsections.

7.2.1 Univariate TimeSeries Clustering based on the First PC of the Scaled Sensor Data

load(file = "./Data/RGenerated/NormMedianFilteredData.RData")
load(file = "./Data/RGenerated/NormCusumData.RData")
# medianfilt_norm, cusum_norm

pca_subjects_loadings_scaled <- list()
pca_subjects_scaled <- data.frame(matrix(nrow=2000, ncol=15))
pc_prop_var_explained_scaled <- vector()

for (i in 1:15) {
  df <- get(paste0("subject",i,"_medianf_norm"))
  pca_dataholder <-  prcomp(df[,2:4], center=TRUE, scale. = T) # Scale <- Standardize Vars
  pca_subjects_loadings_scaled[[i]] <- pca_dataholder$rotation[,1] # Weights for First PC
  # Understanding the % Variation Explained by the First PC
  pc_std <- pca_dataholder$sdev
  pc_var <- pc_std^2
  pc_prop_var_explained_scaled[i] <- round(pc_var[1]/sum(pc_var),3)
  #  Storing the Data for First PC rotated values
  pca_subjects_scaled[,i] <- pca_dataholder$x[,1] # First PC
}

distances <- diss(pca_subjects_scaled, "DTWARP")
dimnames(distances) <- paste0("P",seq(1,15,1))

cat(paste0('<source> <p> Based on the above analysis, the % variation explained by the first principal component of the scaled median filtered data for each participant is equal to: ', paste(round(pc_prop_var_explained_scaled, 2), collapse = ", "), '. To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the <i> Dynamic Time Warping Method </i>. The resulting dendogram is shown in the figure below. </p> </source>'))

Based on the above analysis, the % variation explained by the first principal component of the scaled median filtered data for each participant is equal to: 0.75, 0.8, 0.56, 0.88, 0.54, 0.68, 0.4, 0.74, 0.5, 0.78, 0.83, 0.54, 0.48, 0.76, 0.73. To understand how the underlying multivariate time series methods are similar, we applied a heirarchical clustering algorithm based on the distances obtained using the Dynamic Time Warping Method . The resulting dendogram is shown in the figure below.

cat("\n")
# distances.matrix <- as.matrix(distances)
# heatmap(distances.matrix)


clusters <- hclust(distances)

plot(clusters, hang=0.1, check = TRUE, 
     axes = TRUE, ann = TRUE, ylab="Height",
     main = "Cluster Dendogram based on the DTWARP distances")

cat("\n")
cat(paste0('Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the  cutree function in R. We show the resulting cluster membership in the table below.'))

Based on the above dendogram, one can see that the best choice for total number of clusters is equal to: 3 or 4. This number was then used to cut of the dendogram tree using the cutree function in R. We show the resulting cluster membership in the table below.

cat("\n")
clusterCut3 <- cutree(clusters, 3)
clusterCut4 <- cutree(clusters, 4)

cat(paste0('The associated cluster assignments for both thresholds are as follows: \n', 'When using k=3, the assignments are as follows: \n'))

The associated cluster assignments for both thresholds are as follows: When using k=3, the assignments are as follows:

kable(t(clusterCut3), row.names = NA, caption = "Cluster Assignment for each Participant | k=3") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Cluster Assignment for each Participant | k=3
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
1 1 2 1 1 1 3 3 3 1 1 3 3 3 1
cat(paste0(' \n', 'On the other hand, when k=4, the following assignments were made: \n'))

On the other hand, when k=4, the following assignments were made:

kable(t(clusterCut4), row.names = NA, caption = "Cluster Assignment for each Participant | k=4") %>% column_spec(1, bold = T, width = "5em") %>% kable_styling(bootstrap_options = "striped", full_width = F)
Cluster Assignment for each Participant | k=4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
1 2 3 2 1 1 4 4 4 2 1 4 4 4 2
cat("\n")
save(clusterCut3, clusterCut4, pc_prop_var_explained_scaled,
     file = "./Data/RGenerated/ScaledPcaClusters.RData")

8 Comments

From B and C, we are hoping to develop a general diagnosis/understanding of how fatigue develops (e.g., we start with a ramp up phase, then steady state, then fatigue starting, and then person is totally fatigued).


  1. Department of Mechanical and Aerospace Engineering, University at Buffalo.

  2. Department of Industrial and Systems Engineering, University at Buffalo.

  3. Farmer School of Business, Miami University.

  4. College for Public Health and Social Justice, Saint Louis University

  5. Farmer School of Business, Miami University. Corresponding author. fmegahed@miamioh.edu.